Skip to content

Conversation

encukou
Copy link
Member

@encukou encukou commented Apr 5, 2024

To test the errors argument, we read a UTF-16 file as UTF-8 with "backslashreplace" error handling. However, the utf-16 codec adds an endian-specific byte-order mark, so on big-endian machines the expectation doesn't match the test file (which was saved on a little-endian machine).

Use endswith to ignore the BOM.

… tests

To test the `errors` argument, we read a UTF-16 file as UTF-8
with "backslashreplace" error handling. However, the utf-16
codec adds an endian-specific byte-order mark, so on big-endian
machines the expectation doesn't match the test file (which was
saved on a little-endian machine).

Use endswith to ignore the BOM.
@encukou encukou requested review from jaraco, warsaw and FFY00 as code owners April 5, 2024 13:46
@bedevere-app bedevere-app bot added tests Tests in the Lib/test dir awaiting core review labels Apr 5, 2024
@encukou encukou changed the title gh-116609: Ignore UTF-16 BOM in importlib.resources._functional tests gh-116608: Ignore UTF-16 BOM in importlib.resources._functional tests Apr 5, 2024
@encukou
Copy link
Member Author

encukou commented Apr 5, 2024

!buildbot s390x

@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @encukou for commit 26ae210 🤖

The command will test the builders whose names match following regular expression: s390x

The builders matched are:

  • s390x Fedora Rawhide Clang Installed PR
  • s390x Fedora Rawhide Clang PR
  • s390x Fedora LTO PR
  • s390x Fedora Refleaks PR
  • s390x RHEL7 LTO + PGO PR
  • s390x Fedora LTO + PGO PR
  • s390x Fedora Clang PR
  • s390x Fedora PR
  • s390x Fedora Rawhide LTO PR
  • s390x Fedora Rawhide PR
  • s390x RHEL8 LTO PR
  • s390x Fedora Rawhide Refleaks PR
  • s390x Fedora Clang Installed PR
  • s390x RHEL8 PR
  • s390x Fedora Rawhide LTO + PGO PR
  • s390x RHEL8 Refleaks PR
  • s390x RHEL8 LTO + PGO PR
  • s390x RHEL7 PR
  • s390x RHEL7 LTO PR
  • s390x RHEL7 Refleaks PR
  • s390x SLES PR
  • s390x Debian PR

@encukou encukou merged commit 4d4a6f1 into python:main Apr 5, 2024
@encukou encukou deleted the importlib-tests-be branch April 5, 2024 15:00
@zooba
Copy link
Member

zooba commented Apr 8, 2024

@encukou Out of interest, was the endswith necessary? I thought using utf-16-le would strip the BOM automatically, and the issue you were hitting is that utf-16-be (implied by utf-16 on BE machines) was rejecting it. Explicitly specifying -le should have worked, I'd thought.

diegorusso pushed a commit to diegorusso/cpython that referenced this pull request Apr 17, 2024
… tests (pythonGH-117569)

pythongh-116609: Ignore UTF-16 BOM in importlib.resources._functional tests

To test the `errors` argument, we read a UTF-16 file as UTF-8
with "backslashreplace" error handling. However, the utf-16
codec adds an endian-specific byte-order mark, so on big-endian
machines the expectation doesn't match the test file (which was
saved on a little-endian machine).

Use endswith to ignore the BOM.
@jaraco
Copy link
Member

jaraco commented Aug 14, 2024

This change needs to be applied to importlib_resources. It looks like a related issue was reported in python/importlib_resources#312.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
skip news tests Tests in the Lib/test dir
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants